12 research outputs found
Policy-Adaptive Estimator Selection for Off-Policy Evaluation
Off-policy evaluation (OPE) aims to accurately evaluate the performance of
counterfactual policies using only offline logged data. Although many
estimators have been developed, there is no single estimator that dominates the
others, because the estimators' accuracy can vary greatly depending on a given
OPE task such as the evaluation policy, number of actions, and noise level.
Thus, the data-driven estimator selection problem is becoming increasingly
important and can have a significant impact on the accuracy of OPE. However,
identifying the most accurate estimator using only the logged data is quite
challenging because the ground-truth estimation accuracy of estimators is
generally unavailable. This paper studies this challenging problem of estimator
selection for OPE for the first time. In particular, we enable an estimator
selection that is adaptive to a given OPE task, by appropriately subsampling
available logged data and constructing pseudo policies useful for the
underlying estimator selection task. Comprehensive experiments on both
synthetic and real-world company data demonstrate that the proposed procedure
substantially improves the estimator selection compared to a non-adaptive
heuristic.Comment: accepted at AAAI'2
Off-Policy Evaluation of Ranking Policies under Diverse User Behavior
Ranking interfaces are everywhere in online platforms. There is thus an ever
growing interest in their Off-Policy Evaluation (OPE), aiming towards an
accurate performance evaluation of ranking policies using logged data. A
de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides
an unbiased and consistent value estimate. However, it becomes extremely
inaccurate in the ranking setup due to its high variance under large action
spaces. To deal with this problem, previous studies assume either independent
or cascade user behavior, resulting in some ranking versions of IPS. While
these estimators are somewhat effective in reducing the variance, all existing
estimators apply a single universal assumption to every user, causing excessive
bias and variance. Therefore, this work explores a far more general formulation
where user behavior is diverse and can vary depending on the user context. We
show that the resulting estimator, which we call Adaptive IPS (AIPS), can be
unbiased under any complex user behavior. Moreover, AIPS achieves the minimum
variance among all unbiased estimators based on IPS. We further develop a
procedure to identify the appropriate user behavior model to minimize the mean
squared error (MSE) of AIPS in a data-driven fashion. Extensive experiments
demonstrate that the empirical accuracy improvement can be significant,
enabling effective OPE of ranking systems even under diverse user behavior.Comment: KDD2023 Research trac
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs)
with general function approximation. Existing methods such as sequential
importance sampling estimators and fitted-Q evaluation suffer from the curse of
horizon in POMDPs. To circumvent this problem, we develop a novel model-free
OPE method by introducing future-dependent value functions that take future
proxies as inputs. Future-dependent value functions play similar roles as
classical value functions in fully-observable MDPs. We derive a new Bellman
equation for future-dependent value functions as conditional moment equations
that use history proxies as instrumental variables. We further propose a
minimax learning method to learn future-dependent value functions using the new
Bellman equation. We obtain the PAC result, which implies our OPE estimator is
consistent as long as futures and histories contain sufficient information
about latent states, and the Bellman completeness. Finally, we extend our
methods to learning of dynamics and establish the connection between our
approach and the well-known spectral learning methods in POMDPs.Comment: This paper was accepted in NeurIPS 202
Effectiveness of a digital device providing real-time visualized tooth brushing instructions: A randomized controlled trial
Introduction: The aim of this trial was to investigate whether a digital device that provides real-time visualized brushing instructions would contribute to the removal of dental plaque over usual brushing instructions. Methods: We conducted a single-center, parallel-group, stratified permuted block randomized control trial with 1:1 allocation ratio. Eligibility criteria included people aged ≥ 18 years, and exclude people who met the following criteria: severely crowded teeth; using interdental cleaning implement; having external injury in the oral cavity, or stomatitis; having less than 20 teeth; using orthodontic apparatus; visited to a dental clinic; having the possibility of consulting a dental clinic; having a dental license; not owning a smartphone or tablet device; smoker; taken antibiotics; pregnant; an allergy to the staining fluid; and employee of Sunstar Inc. All participants received tooth brushing instructions using video materials and were randomly assigned to one of two groups for four weeks: (1) an intervention group who used the digital device, providing real-time visualized instructions by connection with a mobile application; and (2) a control group that used a digital device which only collected their brushing logs. The primary outcome was the change in 6-point method plaque control record (PCR) score of all teeth between baseline and week 4. The t-test was used to compare the two groups in accordance with intention-to-treat principles. Results: Among 118 enrolled individuals, 112 participants were eligible for our analyses. The mean of PCR score at week 4 was 45.05% in the intervention group and 49.65% in the control group, and the change of PCR score from baseline was −20.46% in the intervention group and −15.77% in the control group (p = 0.088, 95% confidence interval −0.70–10.07). Conclusions: A digital device providing real-time visualized brushing instructions may be effective for the removal of dental plaque
Prehospital cardiopulmonary resuscitation duration and neurological outcome after out-of-hospital cardiac arrest among children by location of arrest: a Nationwide cohort study
Background: Little is known about the associations between the duration of prehospital cardiopulmonary resuscitation (CPR) by emergency medical services (EMS) and outcomes among paediatric patients with out-of-hospital cardiac arrests (OHCAs). We investigated these associations and the optimal prehospital EMS CPR duration by the location of arrests. Methods: We included paediatric patients aged 0–17 years with OHCAs before EMS arrival who were transported to medical institutions after resuscitation by bystanders or EMS personnel. We excluded paediatric OHCA patients for whom CPR was not performed, who had cardiac arrest after EMS arrival, whose EMS CPR duration were 30 min) in both groups (1.4% [6/417] in residential locations and 0.6% [1/170] in public locations). Conclusions: A longer prehospital EMS CPR duration is independently associated with a lower proportion of patients with a favourable neurological outcome. The association between prehospital EMS CPR duration and neurological outcome differed significantly by location of arrests